We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and $Q$-learning for tabular Markov decision processes.
translated by 谷歌翻译
我们研究一类弱识别的位置尺度混合模型,其中基于$ N $ i.d.d的最大似然估计。已知样品具有比经典$ N ^ { - \ frac {1} {2}} $错误的较低的精度。我们调查期望 - 最大化(EM)算法是否也会缓慢收敛这些模型。我们为EM提供了严格的表征,用于在一个单变量的环境中拟合弱识别的高斯混合物,其中我们证明EM算法以$ N ^ {\ FRAC {3} {4}} $步骤汇聚,并返回A处的估计欧几里德订单距离$ {n ^ { - \ frac {1} {8}}} $和$ {n ^ { - \ frac {1} {4}} {4}} {4}}分别从真实位置和比例参数。建立单变量环境中的缓慢速率需要具有两个阶段的新型本地化参数,每个阶段都涉及以人口水平应用于不同代理EM操作员的划分基于epoch的参数。我们展示了几种多元($ d \ geq 2 $)的例子,表现出与单变量案件相同的缓慢。当拟合协方差受到限制为身份的倍数时,我们还在特殊情况下在特殊情况下以更高的尺寸证明了更高的统计率。
translated by 谷歌翻译